Goto

Collaborating Authors

 neural radiance field


Compact Neural Volumetric Video Representations with Dynamic Codebooks

Neural Information Processing Systems

This paper addresses the challenge of representing high-fidelity volumetric videos with low storage cost. Some recent feature grid-based methods have shown superior performance of fast learning implicit neural representations from input 2D images. However, such explicit representations easily lead to large model sizes when modeling dynamic scenes. To solve this problem, our key idea is reducing the spatial and temporal redundancy of feature grids, which intrinsically exist due to the self-similarity of scenes. To this end, we propose a novel neural representation, named dynamic codebook, which first merges similar features for the model compression and then compensates for the potential decline in rendering quality by a set of dynamic codes. Experiments on the NHR and DyNeRF datasets demonstrate that the proposed approach achieves state-of-the-art rendering quality, while being able to achieve more storage efficiency.



Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction

Neural Information Processing Systems

In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and computations, MSTH represents a dynamic scene as a weighted combination of a 3D hash encodingOurs and a 4DPlenoptic Dataset 30 hash encoding.


DynPoint: Dynamic Neural Point For View Synthesis

Neural Information Processing Systems

The introduction of neural radiance fields has greatly improved the effectiveness of view synthesis for monocular videos. However, existing algorithms face difficulties when dealing with uncontrolled or lengthy scenarios, and require extensive training time specific to each new scenario. To tackle these limitations, we propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. Rather than encoding the entirety of the scenario information into a latent representation, DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Specifically, this correspondence prediction is achieved through the estimation of consistent depth and scene flow information across frames. Subsequently, the acquired correspondence is utilized to aggregate information from multiple reference frames to a target frame, by constructing hierarchical neural point clouds. The resulting framework enables swift and accurate view synthesis for desired views of target frames. The experimental results obtained demonstrate the considerable acceleration of training time achieved - typically an order of magnitude - by our proposed method while yielding comparable outcomes compared to prior approaches. Furthermore, our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.




UE4-NeRF: Neural Radiance Field for Real-Time Rendering of Large-Scale Scene

Neural Information Processing Systems

Neural Radiance Field (NeRF) is an implicit 3D reconstruction method that has shown immense potential and has gained significant attention for its ability to reconstruct 3D scenes solely from a set of photographs. However, its real-time rendering capability, especially for interactive real-time rendering of large-scale scenes, has significant limitations. To address this challenge, we propose a novel neural rendering system called UE4-NeRF that is designed for real-time rendering of large-scale scenes. Our proposed approach partitions large scenes into subNeRFs, and uses polygonal meshes to represent them. In order to represent the partitioned independent scene, we initialize polygonal meshes by constructing multiple regular octahedra within the scene and the vertices of the polygonal faces are continuously optimized during the training process. Drawing inspiration from the Level of Detail (LOD) techniques, we train meshes with varying levels of detail for different observation levels. Our approach combines with the rasterization pipeline in Unreal Engine 4 (UE4), achieving real-time rendering of large-scale scenes at 4K resolution with a frame rate of up to 43 FPS. Our experimental results demonstrate that our method attains rendering quality on par with state-of-the-art approaches, while additionally offering the advantage of real-time performance.



Mip-NeRF 360 Ours GT w/o diffusionw/o background Ours GT PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

Neural Information Processing Systems

The BlendedMVS [7] dataset is a large-scale synthetic dataset for multi-view 6 stereo containing 113 scenes, which can be further divided into large-scale outdoor scenes part and 7 small-scale objects part according to the scene scale. Since current large-scene NeRF methods are 8 one model per scene, to save computational resources and time, we select the first five scenes of the 9 large-scale outdoor scenes part and compare with Mip-NeRF 360 [2], which is the optimal baseline 10 on the representative subset of OMMO dataset [3] as shown in our manuscript, see Tab. 4 and Figure 1 .